Speech-Recognition Interfaces for Music Information Retrieval: 'Speech Completion' and 'Speech Spotter'

نویسندگان

  • Masataka Goto
  • Katunobu Itou
  • Koji Kitayama
  • Tetsunori Kobayashi
چکیده

This paper describes music information retrieval (MIR) systems featuring automatic speech recognition. Although various interfaces for MIR have been proposed, speech-recognition interfaces suitable for retrieving musical pieces have not been studied. We propose two different speech-recognition interfaces for MIR, speech completion and speech spotter, and describe two MIR-based hands-free jukebox systems that enable a user to retrieve and play back a musical piece by saying its title or the artist’s name. The first is a music-retrieval system with the speech-completion interface that is suitable for music stores and car-driving situations. When a user can remember only part of the name of a musical piece or an artist and utters only a remembered fragment, the system helps the user recall and enter the name by completing the fragment. The second is a background-music playback system with the speech-spotter interface that can enrich human-human conversation. When a user is talking to another person, the system allows the user to enter voice commands for music-playback control by spotting a special voice-command utterance in face-to-face or telephone conversations. Our experimental results from use of these systems have demonstrated the effectiveness of the speech-completion and speech-spotter interfaces.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speech Interface Exploiting Nonverbal Information

This paper introduces our research on speech interfaces using nonverbal information and examines new possibilities in speech interfaces. Although speech information consists of verbal and nonverbal information, most speech-recognition research has made use of only verbal information. From among nonverbal information, we have focused on hesitation (filled pause) and prosody (voice pitch) to crea...

متن کامل

Speech Spotter: On-demand Speech Recognition in Human-Human Conversation on the Telephone or in Face-to-Face Situations / Masataka Goto

This paper describes a novel speech-interface function, called “speech spotter”, which enables a user to enter voice commands into a speech recognizer in the midst of natural human-human conversation. In the past, it has been difficult to use automatic speech recognition in human-human conversation since it was not easy to judge, from only microphone input, whether a user was speaking to anothe...

متن کامل

Speech spotter: on-demand speech recognition in human-human conversation on the telephone or in face-to-face situations

This paper describes a novel speech-interface function, called “speech spotter”, which enables a user to enter voice commands into a speech recognizer in the midst of natural human-human conversation. In the past, it has been difficult to use automatic speech recognition in human-human conversation since it was not easy to judge, from only microphone input, whether a user was speaking to anothe...

متن کامل

Speech Interface Exploiting Intentionally-Controlled Nonverbal Speech Information

This paper describes our research on speech interfaces using nonverbal speech information. Although speech information consists of verbal and nonverbal information, most speechrecognition research has made use of only verbal information such as words and sentences. From among nonverbal information, we have focused on hesitation (filled pause) and prosody (voice pitch) to create four speech-inte...

متن کامل

An Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition

Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004